46
Algorithms for Binary Neural Networks
2
3
4
Number of clustering centers
80
85
90
95
MCNs without center loss
MCNs with center loss
FIGURE 3.4
Accuracy with different numbers of clustering centers for 20-layer MCNs with width 16-16-
32-64.
with a batch size of 128. Using different values of θ, the performance of MCNs is shown in
Fig. 3.7. First, only the effect of θ is evaluated. Then the center loss is implemented based
on a fine-tuning process. Performance is observed to be stable with variations θ and λ.
The number of clustering centers: We show the quantization with U = 2, 3, 4 denoting
the numbers of clustering centers. In this experiment, we investigate the effect of varying
the number of clustering centers in MCNs based on CIFAR-10.
The results are shown in Fig. 3.4, where accuracy increases with more clustering centers
and center loss can also be used to improve performance. However, to save storage space
and to compare with other binary networks, we use two clustering centers for MCNs in all
the following experiments.
Our binarized networks can save storage space by 32 in convolutional layers compared
with the corresponding full-precision networks, where 4 bytes (32 bits) represent a real
value. Since MCNs only contain one fully connected layer that is not binarized, the storage
of the whole network is significantly reduced.
The architecture parameter K: The number of planes for each M-Filter, i.e., K, is also
evaluated. As revealed by the results in Fig. 3.5, more planes in each M-filter involved in
reconstructing the unbinarized filters yield better performance. For example, when increas-
ing K from 4 to 8, the performance is improved by 1.02%. For simplicity, we choose K = 4
in the following experiments.
The width of MCNs:
CIFAR-10 is used to evaluate the effect of the width of Wide-
ResNets with MCNs. The accuracy and number of parameters are compared with a recent
binary CNN,
LBCNN. The basic width of the stage (the number of convolution kernels
per layer) is set to 16 −16 −32 −64. To compare with LBCNN, we set up 20-layer MCNs
with basic block-c (in Fig. 3.9), whose depth is the same as in LBCNN. We also use other
network widths to evaluate the effect of width on MCNs.
The results are shown in Table 3.1. The second column refers to the width of each layer
of the MCNs, and a similar notation is also used in [281]. In the third column, we give the
parameter amounts of MCNs and the 20-layer LBCNN with the best result. The fourth
column shows the accuracy of baselines whose networks are trained based on the Wide-
ResNets (WRNs) structure with the same depth and width as the MCNs. The last two